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Abstract. We consider the inference of the cosmic radiation density, traditionally 
parameterised as the effective number of neutrino species N e g, from precision 
cosmological data. Paying particular attention to systematic effects, notably scale- 
dependent biasing in the galaxy power spectrum, we find no evidence for a significant 
deviation of N c g from the standard value of N® s = 3.046 in any combination of 
cosmological data sets, in contrast to some recent conclusions of other authors. The 
combination of all available data in the linear regime prefers, in the context of a 
"vanilla+A^eff" cosmological model, 1.1 < N e g < 4.8 (95% C.L.) with a best-fit 
value of 2.6. Adding data at smaller scales, notably the Lyman-a forest, we find 
2.2 < Neg < 5.8 (95% C.L.) with 3.8 as the best fit. Inclusion of the Lyman-a data 
shifts the preferred N c g upwards because the ag value derived from the SDSS Lyman-a 
data is inconsistent with that inferred from CMB. In an extended cosmological model 
that includes a nonzero mass for N e s neutrino flavours, a running scalar spectral index 
and a w parameter for the dark energy, we find 0.8 < N e g < 6.1 (95% C.L.) with 3.0 
as the best fit. 
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1. Introduction 

The observed global properties of the universe can be remarkably well described by 
the ACDM model in conjunction with simple initial conditions for the primordial 
density fluctuation spectrum. In its simplest form the model is geometrically flat and 
represented by nontrivial values for six key parameters: the baryon density, the dark 
matter density, the Hubble parameter, the amplitude and spectral index of primordial 
adiabatic scalar fluctuations, and the optical depth to reionisation. No single additional 
parameter provides a substantially better fit to currently available data, a situation 
summarised by Max Tegmark's dictum, "vanilla rules ok" [1]. 

There are however many ways to extend this vanilla model, some of which are 
physically well-motivated, such as a nontrivial equation of state p = wp for the dark 
energy, or a running spectral index for the spectrum of primordial density fluctuations. 
An extension with a nonvanishing hot dark matter component is actually unavoidable 
because neutrinos are known to have mass and the current direct laboratory limits 
are so loose that neutrino hot dark matter could easily play an important role. Many 
authors have sought to constrain neutrino masses in the context of ACDM cosmology 
by inference from cosmological data, and found no evidence for a nonvanishing value on 
the level of precision that can be achieved with existing data. 

Another extension invokes a nonstandard radiation density, traditionally param- 
eterised by the effective number N cS of neutrino species, with N® s = 3.046 being the 
standard value [2]. This tradition dates back to the time before LEP at CERN measured 
the number of ordinary neutrino species to be 3 and big bang nucleosynthesis (BBN) 
provided the only significant upper limit on the number of particle families. Today, 
constraining N c s with cosmological data is primarily a consistency test of standard 
particle physics with concordance cosmology and of concordance cosmology with itself 
because one can compare the radiation density allowed by BBN with that implied by 
precision cosmological data which probe physics at different epochs. 

This exercise has been performed by several groups before [3-8] and after [9-15] 
the release of the WMAP 3-year data [15-17]. Some of these recent results suggest 
surprisingly large values for N eS , with 95% C.L. intervals that do not always include the 
standard value N® s = 3.046 [9,13, 15]. The apparent conflict of these results and the 
exciting possibility of a deviation from the minimal cosmology has motivated us to re- 
examine the cosmological N e g determination. Our goals are two- fold: first, to identify 
the source of discrepancy in previous analyses, and second, to provide an up-to-date 
estimation of N e s within more general model frameworks. 

One possible source for the overestimation of N c s is an incorrect statistical 
methodology. The popular software GetDist, an analysis package frequently used 
in conjunction with the Monte Carlo Markov Chain generator CosmoMC [18, 19] for 
cosmological parameter estimation, provides by default ID error estimates based on the 
central rather than the minimal credible interval, although the latter is more meaningful 
for inference problems. These constructions differ significantly for skewed distributions, 
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but become identical in the Gaussian limit. We find that this effect can indeed be 
significant if one uses a small number of data sets that are not very constraining, since 
in these cases the ID marginal posterior distribution for N e g often has a long tail towards 
large N c g values as a result of strong degeneracies with other parameters. However, when 
many data sets are combined and conspire to remove these degeneracies, the ID posterior 
for iV eff usually becomes narrow enough to approach the Gaussian limit. Therefore the 
different error construction methods are probably not the main source of discrepancy. 

The two main problems we have identified that affect the determination of N e g 
are (i) an unusually large fluctuation amplitude reconstructed from the Lyman-o; forest 
data [20] relative to that inferred from WMAP, and (ii) the treatment of scale-dependent 
biasing in the galaxy power spectrum inferred from the main galaxy sample of the Sloan 
Digital Sky Survey data release 2 (SDSS-DR2) [21, 22]. The first issue is well known, and 
its complete investigation — involving elaborate astrophysical modelling — is beyond the 
scope of the present work. The second issue is more subtle. In previous analyses, scale- 
dependent biasing in SDSS-DR2 has either been ignored [15], or treated with empirical 
correction formulae under overly restrictive conditions [9, 13]. We will explain this issue 
in more detail in section 4 below. Here we anticipate that no exotic values for N e s 
will be found if one either avoids small-scale data altogether or if one avoids artificially 
constraining assumptions about the extent of the scale dependence. 

To derive our estimate for N c s we begin in section 2 with a description of our 
cosmological parameter framework, and in section 3 the cosmological data to be used. 
In section 4 we discuss the problem of galaxy bias and its scale dependence. In section 5 
we compare different statistical inference methods frequently encountered in the context 
of cosmological parameter estimation, and the way they provide "best-fit parameters" 
and associated error estimates. In section 6 we study iV eff in a minimal cosmological 
model which has a nonstandard radiation density as the only extension to vanilla 
cosmology. We use this simple scenario as a benchmark to compare results from different 
combinations of data and with different statistical methods. In section 7 we consider an 
extended model that includes as free parameters also a constant dark energy equation 
of state parameter, a running spectral index, and neutrino masses. In the framework 
of standard Bayesian statistics we provide credible intervals for N e g. In section 8 we 
summarise our findings. 



2. Cosmological models 

We perform our inference in the framework of a cosmological model with vanishing 
spatial curvature and described by eleven free parameters, 

= {uJdm,Ub,Ho,T,\n(l0 10 A s ),n s ,f l ;,N m N cS ,w,a s }. (2.1) 
s , 

vanilla 

Here, the physical dark matter density u^m — ^dmh 2 , the baryon density = Qf,h 2 , the 
Hubble parameter H = h 100 km s -1 Mpc -1 , the optical depth to reionisation r, the 
amplitude A s , and the spectral index n s of the primordial scalar power spectrum are 
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Table 1. Standard values and priors for our cosmological fit parameters. Prior 2 is 
identical to prior 1 except for the Hubble parameter. All priors are uniform in the 
given intervals (i.e., top hat). Depending on the investigated scenario, we use either 
the standard value, or one of the priors for each parameter. 



Parameter 


Standard 


Prior 1 


Prior 2 






0.01-0.99 








0.005-0.1 




h 




0.2-2.0 


0.4-1.0 


T 




0.01-0.8 




mtlO 10 ^) 




2.7-4.0 




n s 




0.5-1.5 




u 





0-0.5 




N m 





0-50 




N cS 


3.046 


0-50 




w 


-1 


-2-0 




a s 





-0.2-0.2 





collectively labelled the "vanilla" parameters. They represent the simplest parameter 
set necessary for a consistent interpretation of currently available data. 

The next three parameters denote a nonzero neutrino fraction /„ = f^/fW of the 
present day dark matter content, the number N m of massive neutrino species, assuming 
a common mass value m v for all of them, and the total effective number iV e fj of massless 
plus massive neutrinos. Of course, iV e ff can also include other forms of radiation. With 
these definitions, N m enters the present-day energy density as 

2 = Nrani^ = Emu , v 

93eV 93eV V ; 

During the radiation-domination epoch the total energy density is 



7T 



2 



30 



t: 



2 + 2 x -N t 



7 



eff 




(2.3) 



where T 7 and T v are the photon and neutrino temperatures respectively. 

The last two parameters in equation (2.1) represent a constant equation of state 
parameter for the dark energy w, and a running parameter a s in the scalar power 
spectrum defined at the pivot scale k = 0.002 Mpc -1 . 

The vanilla cosmological model is defined by holding all non-vanilla parameters 
fixed at their standard values given in table 1. In the same table we also show the priors 
assumed for all cosmological fit parameters. We shall consider several scenarios, each 
including N e $ as a free parameter. 

Minimal model 



Our minimal model (section 6) has seven free parameters, namely, vanilla+iV eff , while 
the other parameters are fixed at their standard values. In particular, all neutrinos are 
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assumed to be massless. Most constraints on N e g in the recent literature were derived 
within this framework [9,11-13,15]. Therefore, the minimal model lends itself as a 
benchmark case to the study of differences and similarities between our results and 
those of previous authors, as well as differences between different analysis methods. 

Extended models 

As in the minimal model, our extended models (section 7) always include the vanilla 
parameters and iV e fj. In addition, we include neutrino masses and hence the parameter 
f u . Extended models with j v as a free parameter were also considered in Refs. [3, 7, 8, 10]. 
However, there are many different ways to incorporate neutrino masses into the analysis. 
We shall consider two scenarios. In the first, we assume that all degrees of freedom 
represented by N eS have equal mass m u , i.e., N m = N cS . An increased effective number 
density of ordinary neutrinos could be due to, for example, a chemical potential in the 
neutrino phase space. :(: 

A second way to include neutrino masses, to be denoted 3 f u , is to fix N m = 
N® s = 3.046, i.e., the standard density of ordinary neutrinos, each with a mass m„, 
is guaranteed. The remaining N c g — N® s species are massless degrees of freedom that 
truly represent radiation; we do not assume anything about its physical nature. The 
prior N® s < N cS < 50 will be used in this case. 

In both cases we consider also more elaborate scenarios in which w and a s are 
treated as free parameters, motivated by the well-known degeneracies between N e s and 
/„ [3], and between N eS and w [23]. Studying these larger models and comparing them 
with simpler ones illustrates how well combinations of different data sets can break these 
degeneracies. 

3. Data 

3.1. Cosmic microwave background (CMB) 

We use CMB data from the Wilkinson Microwave Anisotropy Probe (WMAP) exper- 
iment after three years of observation [15-17]. The data analysis is performed using 
version 2 of the likelihood calculation package provided by the WMAP team on the 
LAMBDA homepage [24]. 

3.2. Large scale structure (LSS) 

The large scale matter power spectrum has been inferred from the galaxy clustering data 
of the Sloan Digital Sky Survey (SDSS) [1,21,22,25] and the Two-degree Field Galaxy 
Redshift Survey (2dF) [26]. In particular, the luminous red galaxies (LRG) sample 



| Technically, even though a chemical potential does increase the neutrino number density, our 
treatment does not fully cover this case because it entails a neutrino velocity dispersion different from 
the standard non-degenerate Fermi-Dirac distribution. 
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from the recent SDSS data release 5 (DR5) supersedes all previous power spectrum 
measurements in terms of statistical significance [1,25]. However, the "old" spectrum 
retrieved from the SDSS main galaxy sample from data release 2 (SDSS-DR2) [21,22] 
is still drawing attention, primarily because the parameter estimates inferred therefrom 
appear to be in conflict with those derived from other probes. We shall therefore analyse 
this data set as well. As it turns out, the apparent discrepancy can be explained in terms 
of scale-dependent bias (section 4). 

3.3. Baryon acoustic oscillations (BAO) 

The baryon acoustic oscillations peak has been measured in the SDSS luminous red 
galaxy sample [27]. We use all 20 points in the two-point correlation data set 
supplied in Ref. [27] and the analysis procedure described therein, including power 
spectrum dewiggling, nonlinear corrections with the Halofit package [28] , corrections 
for redshift-space distortion, and analytic marginalisation over the normalisation of the 
correlation function. Except for the last marginalisation, these corrections are applied 
largely for cosmetic reasons; we obtain essentially the same results even without them. 

3.4- Type la supernovae (SNIa) 

We use the luminosity distance measurements of distant type la supernovae provided 
by Davis et al. [29]. This sample is a compilation of supernovae measured by the 
Supernova Legacy Survey (SNLS) [30], the ESSENCE project [31], and the Hubble 
Space Telescope [32], as well as a set of 45 nearby supernovae. In total the sample 
contains 192 supernovae. 

3.5. Hubble space telescope key project (HST) 

In some cases we use the direct measurement of the Hubble parameter from the HST 
key project, H = 72 ± 8 km s" 1 Mpc" 1 [33]. 

3.6. Lyman-a forest (Lya) 

Measurements of the flux power spectrum of the Lyman-a forest has been used to 
reconstruct the matter power spectrum on small scales at large redshifts. By far the 
largest sample of spectra comes from the SDSS survey. This data set was carefully 
analysed in McDonald et al. [20] and used to constrain the linear matter power 
spectrum. The derived linear fluctuation amplitude at k — 0.009 km s _1 and z = 3 is 
A 2 = 0.452^o;q(3, and the effective spectral index n cff = —2.321 jlQ^I. These results were 
derived using a very elaborate model of the local intergalactic medium in conjunction 
with hydrodynamic simulations. 

While the Lya data provides in principle a very powerful probe of the fluctuation 
amplitude on small scales, the question remains as to the level of systematic uncertainty 
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in the result. The same data has been reanalysed by Seljak et al. [9] and Viel et al. [34- 
36], with somewhat different results. Specifically, the normalisation found in Refs. [34- 
36] is lower than that reported in Ref. [20]. 

We shall use the default Lya module provided in the CosmoMC package in 
some parts of our analysis. This module uses the SDSS-Lya data based on McDonald 
et al. [20], and does not support the parameters f v , w and a s in our extended models 
(it does support N eS , however). Therefore, the Lya data will be analysed only in the 
context of the minimal model. 

We stress that our Lya results would likely be somewhat different if the Viel et al. 
analysis of SDSS-Lya had been used. However, when all available cosmological data sets 
are used in combination, the Lya data carries relatively little weight in the combined 
fit for N c s and is not crucial for our conclusions. 

4. Scale-dependent bias 

The conventional wisdom behind using galaxy survey data to infer the underlying matter 
distribution is that, on sufficiently large scales, the galaxy power spectrum P g traces that 
of the total matter content P m calculated from linear theory up to a constant, scale- 
independent bias factor, 

P g (k) = b 2 P« n (k). (4.1) 

This relation is of course not exact, and its region of applicability limited. On sufficiently 
small scales we expect nonlinear evolution to cause its breakdown. 

One obvious source for correction is the nonlinear growth of the underlying matter 
density field on scales k > k n \ ~ 0.15 h Mpc" 1 . Another is the violation of scale 
independence for the galaxy bias. The latter arises from the fact that galaxy formation 
takes place preferentially in dark matter halos with certain optimal masses, which are 
themselves biased tracers of the matter distribution [37, 38] . Indeed, depending on the 
galaxy morphology, theoretical modelling and numerical simulations suggest that the 
galaxy bias can deviate markedly from scale independence already at nominally linear 
scales k ~ 0.1 h Mpc -1 [39,40]. The problem this presents to cosmological parameter 
estimation is immediate: power spectrum measurements on scales in the vicinity of 
k ~ 0.1 h Mpc -1 carry substantial weight in statistical inferences because of their 
small formal error bars. Improper handling of the galaxy bias will therefore likely yield 
misleading results, a point we discuss in more detail below. 

Unfortunately, neither theoretical modelling nor simulations are as yet able to 
accurately predict the galaxy bias and its scale dependence. In the meantime, we have 
the option to either (i) cut the data at a suitably small fc max , usually A; max < 0.1 h Mpc -1 , 
or, if we want to use more data points, (ii) introduce some fitting formula that 
models crudely the effect of a scale-dependent bias and then marginalise over the 
associated nuisance parameters. For the latter approach and in the framework of ACDM 
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cosmologies, Ref. [26] suggests the formula 




(4.2) 



where A g — 1.4 is fixed, and b and Q n \ are free parameters to be marginalised. While the 
issue of bias correction was not explored in the parameter estimation analysis of SDSS- 
DR2 [22], both options (i) and (ii) were considered in the context of the vanilla model 
by the 2dF [26] and the SDSS-DR5 [1] teams in their respective analyses. Both analyses 
found that, after marginalisation over Q n \, additional data beyond k ~ 0.1 h Mpc^ 1 in 
option (ii) lead to no significant deviation in the cosmological parameter estimates or 
improvement in the errors compared to those obtained with the simpler option (i). 

Conversely, if we ignore the issue of scale-dependent bias and adhere strictly to 
the relation (4.1), then it has been shown that the 2dF- inferred Vt m tends towards 
higher values with increasing A; max [26]. More strikingly, analyses of the SDSS-DR5 
data show that the best-fit Vt m values inferred on scales 0.01 < k/(h Mpc^ 1 ) < 0.06 and 
0.01 < k/ (h Mpc -1 ) < 0.15 differ by 2-3<x under the constant bias assumption (4.1) [25]. 
Significant scale dependence in the galaxy bias has been put forward to explain the 
apparent tension between the galaxy power spectra measured by 2dF and SDSS, the 
latter of which tends to select the more strongly-biased red galaxies [25,41]. For the 
purpose of constraining a possible nonstandard radiation density, we note that the well- 
known degeneracy between iV eff and fl m means that any inference of N eS will be highly 
sensitive to how we handle the bias issue, a point also raised in Ref. [12]. We consider 
both a conservative and a more speculative approach. 

Conservative approach: LSS-lin 

In the conservative approach, we use power spectrum data only on scales that are safely 
linear, 

• 2dF-lin, k max ~ 0.09 h Mpc" 1 (17 bands), 

• SDSS-DR2-lin, k max ~ 0.06 h Mpc' 1 (11 bands), and 

• SDSS-LRG-lin from DR5, k max ~ 0.07 h Mpc" 1 (11 bands). 

The combined set of these data is denoted LSS-lin. We adopt the constant bias 
assumption (4.1) for each data set, and marginalise over each of the three bias parameters 
b 2 with a flat prior. 

Speculative approach: LSS-Q 

In the speculative approach, we use data sets collectively denoted as LSS-Q that include 

• 2dF-Q, k max ~ 0.15 h Mpc" 1 (32 bands), 

• SDSS-DR2-Q, k max ~ 0.1 h Mpc^ 1 (14 bands), and 

• SDSS-LRG-Q from DR5, k max ~ 0.2 h Mpc" 1 (20 bands), 
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with k max values chosen to conform with the analyses of Refs. [1] and [15]. Here, we use 
the bias correction formula (4.2) and marginalise over each set of b 2 and Q n \ with flat 
priors. § Our motivation for caution in this case owes itself to the fact that the formula 
(4.2) was originally developed and calibrated for ACDM cosmologies; there is a priori 
no guarantee that it would apply also to nonstandard models. 

We note that Seljak et al. [9] and Mangano et al. [13] also used the bias correction 
formula (4.2) on the SDSS-DR2 data. However, they adopted a Gaussian prior on Q Q i 
of 10 ± 5 that is predetermined from numerical simulations. As we shall see, this choice 
tends to bias their results towards large values of N eS . We believe this is the main origin 
of the discrepant N e s values reported by different groups. 

5. Statistical inference 

5.1. Bayesian inference 

We use standard Bayesian inference techniques, and explore the model parameter space 
with Monte Carlo Markov Chains (MCMC) generated using the publicly available 
CosmoMC package [18,19]. 

Given a set of data x, a direct probabilistic interpretation for the degree of belief in 
the parameters 6 of an assumed underlying model is given by the posterior probability 
distribution 



Here, the likelihood function L(x\6) quantifies the agreement of the data with an 
assumed set of parameter values, while the prior probability tc(0) represents our belief 
in what the true parameter values should be before any data is taken. This inherent 
subjectivity of Bayesian inference is a point of much criticism. A pragmatic approach 
is to employ uniform priors and "let the data decide". However, this approach is not 
entirely free of subjectivity, particularly when it comes to credible interval construction 
and marginalisation (section 5.4). 

5.2. Point estimates 

The posterior probability P(6\x) serves as the starting point for any further inference. 
A natural point of reference is the posterior mode 



representing the most probable parameter values given the data and priors. Note that 
we sometimes refer to the posterior mode as the "best-fit", although strictly speaking 
the term refers to those parameter values that maximise the likelihood and is equivalent 

§ Some recent analyses use a Gaussian prior of Q n \ = 4.6 ± 1.5 when fitting the 2dF data. We point 
out that these numbers are in fact derived from the 2dF data itself [26]. We feel it is inconsistent to 
feed them back into a fit as a prior. 




(5.1) 



= arg maxP(0|:E) , 



(5.2) 
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to 6 only for uniform priors. Another commonly used point estimate is the posterior 
mean or "expectation value" 




For one-dimensional distributions, one may also define the median # me d, where 50% of 
the posterior's volume lie on either side. 

5.3. Credible intervals 

In addition to point estimates one needs credible regions in parameter space that express 
the degree of uncertainty in the inference. A closed but not necessarily connected 
hypersurface dA y , called a 1007% credible region, can be constructed such that the 
hypervolume contains a fraction 7 of the total volume beneath P(6\x), 



This definition is not unique. In the ID case, two popular choices are 

• Central credible interval (CCI) The credible interval [#i ,#hi] means that equal 
fractions (1 — 7)/2 of the posterior's volume lie in (—00, 9\ Q ) and (6*hi, 00). The CCI 
is always connected and contains the median 9 nie &. 

• Minimum credible interval (MCI) For a unimodal distribution, 6> lo and 9 hi are 
chosen to minimise 9^ — 9\ . This amounts to placing [9\ Q , #hi] around the peak 
of the posterior. In general the posterior may be multimodal, and the MCI is 
constructed such that the posterior at any point inside is larger than that at any 
point outside. The MCI need not be connected, but always includes the mode 9. 

These constructions coincide only under special circumstances, e.g., if the posterior 
probability is Gaussian with respect to 9. The top two panels of figure 1 show realistic 
examples of a CCI and an MCI that are very different. 

Which of these constructions should we adopt? Since our goal is to find the most 
probable set of parameter values, we believe that the MCI is more adequate because 
it singles out regions of parameter space with the highest probability densities. In 
particular, the MCI always includes the "best-fit" parameter (more precisely, the mode). 
Finally, for multidimensional posteriors, only the MCI is uniquely defined. 

We discuss these matters in such detail because CosmoMCs popular companion 
package GetDist outputs for ID intervals a CCI, not an MCI, a property that does 
not always seem to be recognised. Moreover, under the default settings, GetDist does 
not output the median # me d, the point estimate naturally associated with the CCI, but 
rather the expectation value (9). 

5.4- Marginalisation of the posterior 

For multi-parameter models typically encountered in cosmology, the information carried 
by the mult i- dimensional hypersurface cM 7 is often not useful in practice and must be 




(5.4) 
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Figure 1. The ID marginal (red/solid) and profile (blue/dotted) posteriors with 
respect to N e g for our minimal model, the data set WMAP+SDSS-DR2-lin and top 
hat prior 0.2 < h < 2.0. The shaded regions are, from top to bottom, the Bayesian 
68% central credible interval, the 68% minimum credible interval, and the la interval 
derived from maximisation. The dashed vertical lines mark, from top to bottom, 
the posterior mean (N e g), the ID marginal posterior mode nQ , and the global 
best fit N e g. 

"compressed." It is common to map the posterior probability P(0\x) onto a lower- 
dimensional subspace by the process of marginalisation, 

P£l se (0 (n) ) « / d9 n+1 ...d6 N P(0\x), (5.5) 

where 0^ = (Oi, . . . ,0 n ) represents the parameters in the n-dimensional subspace. 
Point estimates for 0^ and credible regions may then be constructed from the marginal 
posterior probability in analogy to section 5.3 above. 

Marginalisation favours regions of parameter space that contain a large volume 
of the probability density in the marginalised directions. This "volume effect" can 
sometimes lead to counter-intuitive results, such as suppression of the probability density 
for the global best fit parameters if they appear within sharp peaks or ridges that 
contain little volume. Moreover, the concept of volume itself depends on the choice 
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of parameters. For example, a flat prior on a parameter or one on its logarithm have 
completely different effects on the volume in that parameter direction. Therefore, other 
methods of mapping the multi-dimensional posterior onto a lower-dimensional space can 
be useful. 

5. 5. Maximisation of the posterior 

A complementary approach to marginalisation is to project P(0\x) onto the n- 
dimensional subspace 6^ by maximising along the remaining directions, 



the true peak of the original iV-dimensional posterior probability and hence the global 



in juxtaposition. 

In addition, we introduce an effective chi-square measure for the goodness-of-fit 
relative to the global best fit, 



For n — 1, we define loosely the "la" and "2<r" intervals as the ID regions satisfying 
respectively A%g ff < 1 and Axg ff < 4. We emphasise that these intervals have no formal 
probabilistic interpretation. However, they do provide a raw assessment, unplagued by 
volume effects, of how well a given parameter value agrees with the data relative to the 
global best fit, and have the virtue of being invariant under reparameterisation of the 
model. Of course, if P^ oi / marge (9) is Gaussian, then the la and 2a intervals thus derived 
coincide with the ID marginal 68% and 95% minimum and central credible regions [42]. 
Maximisation was used in some recent studies of cosmological N e g inference [7, 8, 10-12]. 

For simplicity our maximisation intervals are extracted from the same MCMC 
chains used to construct the Bayesian credible intervals. However, we caution that 
MCMC techniques are strictly speaking not designed for this purpose; there exist 
sophisticated optimisation methods such as simulated annealing that are much better 
suited to the task. 

The bottom panel of figure 1 shows a realistic example of a one-dimensional la 
interval constructed according to equation (5.7). For a very non-Gaussian situation 
such as depicted in this figure, the point estimates and corresponding credible intervals 
derived by the methods discussed here are very different. 

6. Constraints in the minimal model 

6.1. Numerical results 




best fit 6. Figure 1 shows a realistic example of a ID marginal and a ID profile posterior 




(5.7) 



To study the impact of different statistical methodologies and of different combinations 
of data sets, we use the minimal model (i.e., vanilla+A^fr) as a benchmark case. Each 
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entry in table 2 gives a point estimate and the lower and upper ends of the appropriate 
68% and 95% credible intervals for N c g. The first column indicates the combinations 
of cosmological data sets. To illustrate the strong degeneracy between jV e ff and the 
Hubble parameter h in some data sets and its consequences, we have used two different 
top-hat priors: the loose prior 1 (0.2 < h < 2.0) and the more constraining prior 2 
(0.4 < h < 1.0). 

In the columns showing the Bayesian central credible interval, we use the posterior 
mean (N eS ) as a point estimate, which is the default output of GetDist. The Bayesian 
minimum credible interval is derived from the ID marginal posterior probability 
distribution for N e g and the corresponding point estimate is the ID marginal posterior 
mode Ngff ■ In the case of maximisation, the point estimate is the global best fit 
N e ff. Here, the associated intervals are the effective la and 2a regions defined by 
equation (5.7). 

6.2. Interpretation of statistics 

To compare estimates from different inference schemes, consider first the top half of 
table 2. The posterior mean and the CCI, i.e., the default output of GetDist, show 
a preference for large iV e ff for almost all combinations of probes. The combinations 
WMAP, WMAP+SDSS-DR2, and WMAP+SNIa, in particular, appear to disfavour 
the standard value N eS = 3.046 at more than 68% (prior 1). However, any evidence for 
N e g > 3.046 disappears as soon as we impose the tighter prior 2 on h. This trend stems 
from the iV efr -/i-degeneracy which leads to a long tail of high N eS in the ID marginal 
posterior (figure 1). The tail in turn pushes the posterior mean and the CCI to larger 
N e fi values. Imposing a tighter prior on h suppresses the tail and reduces this effect. 

In contrast, the ID marginal posterior mode and the global best fit N e s pick 
out the parameters with the highest probability densities, and turn out to be insensitive 
to the choice of h prior. The tail region still has a strong impact on the upper MCI 
limits, but the lower limits are relatively unaffected. The N eS constraints from WMAP 
in table 2 provide an excellent illustration of this point. 

The la and 2a intervals from maximisation depend even less on the h prior, since 
this construction makes no reference to the volume of the posterior and is therefore 
insensitive to tail regions once the ID profile posterior drops below e~ 2 relative to the 
peak. As argued earlier (section 5.3), in Bayesian inference only the MCI provides a 
meaningful answer to the question, what are the most probable values of N c g implied 
by the data. Our explicit examples show that inference based on the CCI, the default 
output of GetDist, can lead to incorrect conclusions. 

6.3. Scale- dependent bias 

Turning to the issue of bias in the galaxy power spectrum, we see in table 2 that 
the two different measures introduced in section 4 to bypass or account for the scale 
dependence, namely, using only linear data at k < 0.1 h Mpc -1 , or adopting the bias 
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Table 2. Point estimates and credible intervals (68% and 95%) for N e g in our 
minimal model "vanilla+ N c g " . The priors for the free parameters are given in 
table 1. Priors 1 and 2 differ only for the Hubble parameter. We consider also 
two large combinations of data sets, All-lin = WMAP+BAO+SNIa+LSS-lin and 
All-Q = WMAP+BAO+SNIa+LSS-Q. 



Bayesian CCI 

68%T, 95%1 
68%i, 95%i 



lAT \ 68%T, 95%T 



Bayesian MCI 

ft(l) 68%T, 95%T 
iV off 68%|, 95% I 



Maximisation 

ft 1*1, 2crT 



Data 



Prior 1 



Prior 2 



Prior 1 



Prior 2 



Prior 1 



Prior 2 



WMAP 



22 



37, 46 
7.3, 2.6 



5.8 



8.8 
3.0 



11 
1.5 



6.8 



32, 45 
2.8, 1.5 



4.2 



7.9, 11 
2.2, 1.2 




+SDSS-DR2-Q 


14 


26, 37 
3.6, 1.2 


4.8 


+SDSS-DR2-lin 


11 


20, 32 
3.0, 1.2 


4.9 


+2dF-Q 


3.2 


5.2, 8.4 
1.1, 0.3 


2.6 


+2dF-lin 


4.6 


7.1, 10 

2.2, 1.1 


4.4 


+SDSS-LRG-Q 


3.5 


5.1, 7.4 
2.0, 1.1 


3.5 


+SDSS-LRG-lin 


4.0 


5.8, 9.4 
2.1, 1.2 


3.5 


+BAO 


3.5 


5.0, 6.8 

2.1, 1.1 


3.5 


+SNIa 


20 


34, 44 
6.4, 2.3 


5.9 


+HST 


3.9 


5.7, 8.3 
2.1, 1.2 


4.0 


+Lya 


7.6 


10. 13 

5.2, 3.6 


6.9 


All-lin 






2.9 


All-lin+HST 






2.8 


All-Q 






2.3 


All-Q+HST 






2.5 


All-Q+Lya 






4.4 


All-Q+Lya+HST 






3.9 



7.7 
2.1 

8.0 
2.0 

4.3 
1.1 

6.8 
2.1 

5.1 
2.0 

5.0 
2.1 

5.0 
2.1 

9.1 
2.8 

5.7 
2.4 

9.0 
4.9 



10 
1.0 

10 
0.7 

5.7 
0.4 

9.6 
1.1 

7.4 
1.1 

6.6 
1.3 

6.8 
1.1 

11 

0.9 

7.5 
1.4 

11 
3.5 



3.6 
3.6 
1.6 
2.9 
2.6 
2.6 
2.8 
4.3 
3.3 
6.8 



18, 34 

0.6, 0.0 

13, 28 

0.7, 0.3 

4.2, 7.5 
0.4, 0.0 

5.8, 9.5 

1.3, 0.6 

4.5, 6.9 

1.5, 0.8 

4.9, 8.4 

1.5, 0.7 

4.7, 6.4 

1.8, 0.8 

28, 42 

2.8, 0.4 

5.1, 7.7 

1.6, 0.8 

9.3. 12 

4.6, 3.3 



o 7 6.4, 9.7 
<5 ' 1.1, 0.7 



4.3 
1.4 



6.5, 9.9 

0.9, 0.5 

3.9, 5.5 

0.7, 0.0 



o 5.7, 9.4 

6 - z 1.4, 0.7 

4.5, 6.9 

5, 0.8 



2.5 i 



2-8 it 



4.6, 6.3 
1.1 

9 o 4.7, 6.4 
z '° 1.8, 0.8 



4.1 

3.6 
6.4 



8.7, 11 
2.4, 0.9 

5.3, 7.0 

2.1, 1.0 

8.8, 11 
4.6, 3.2 



4.0 



3.7 
1.9 

3.2 
1.4 

3.5 
1.6 

5.5 
3.3 

4.8 
3.0 



5.3 
1.1 

4.9 
1.3 

4.4 
0.7 

4.3 
1.0 

6.9 
2.4 

5.9 
2.3 



R 3.7, 5.1 
Z -° 1.5, 0.9 



2.6 
2.0 

2.4 
4.4 
3.8 



3.6, 4.8 

1.8, 1.1 

3.1, 4.1 

1.2, 0.5 

3.3, 4.3 

1.6, 0.9 

5.4, 6.6 
3.2, 2.3 

4.7, 5.8 

2.9, 2.2 



correction formula (4.2), generally produce consistent results. The agreement between 
WMAP+SDSS-DR2-lin and WMAP+SDSS-DR2-Q, and between WMAP+SDSS-LRG- 
lin and WMAP+SDSS-LRG-Q are excellent, suggesting that the effects of scale- 
dependent biasing have been successfully ameliorated. The WMAP+2dF-lin and 
WMAP+2dF-Q results do show a slight discrepancy at roughly the 68% level. This can 
most likely be put down to statistical fluctuations, but recall that the bias correction 
formula (4.2) has not been tested for nonstandard cosmologies and its application here 
is, strictly speaking, experimental. 

The analyses of Seljak et al. [9] and Mangano et al. [13] found a very high 
N cS = 7.8 \ 6 for WMAP+SDSS-DR2+SNIa, which can only be accommodated 
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Figure 2. The 2D marginal 68% and 95% allowed regions in the minimal model 
for iVefi and Q nh using the data set WMAP+SDSS-DR2-Q+SNIa and prior 2. The 
horizontal dotted lines indicate the la range of the Gaussian prior Q n \ — 10 ± 5. 



within our corresponding MCI estimates, iV eff = 3.7 9 7 7 for WMAP+SDSS-DR2- 
Q and N cS = 4.3 9 9 5 for WMAP+SDSS-DR2-lin, at more than the 68% level. 
Both groups used the bias correction formula (4.2), but adopted the Gaussian prior 
Q n i = 10 ± 5, a range supposedly determined from numerical simulations, although no 
source is cited. As a test, we have performed a fit of WMAP+SDSS-DR2-Q+SNIa using 
the same Gaussian prior on Q n \. We find N eS = 6.2 ± { 19 (MCI) and N cS = 7.0 9 ' 9 ' ^2 
(CCI), which include the high N efi values of Refs. [9, 13] in the 68% region. Excluding 
SNIa from the fit yields essentially the same constraints. 

These test results clearly indicate that the choice of Q n \ prior plays an important 
role in the inference of N e g. In this case, the choice of Q n \ = 10 ± 5 tends to push the 
preferred iV e fT to higher values. We are not able to reproduce the very tight error bars 
for N e ff reported in Refs. [9, 13], which may be due to different priors assumed for the 
marginalised parameters, or because of a slightly larger k max ~ 0.15 h Mpc -1 adopted 
in these analyses. However, we also observe a peculiar feature in their credible intervals: 
the 68% interval is some three times smaller than the 95% interval. This suggests 
some highly non-Gaussian behaviour in their marginal posterior for N e g, because in a 
Gaussian distribution, the ratio of the intervals is 1 : 2. 

The dependence on the Q n \ prior traces its origin to a degeneracy between N eS and 
Q n \. Figure 2 shows the 2D marginal 68% and 95% allowed regions in A^fr-Qni-space 
for the data set WMAP+SDSS-DR2-Q+SNIa. Evidently, imposing the restrictive prior 
Q n i = 10 ±5 cuts off much of the parameter space that favours low values of N e g. To our 
knowledge no simulation of mock galaxy catalogues involving a nonstandard N e g value 
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has ever been reported in the literature. Without the backing of simulations (or other 
independent input) there is no justification to impose a restrictive prior on Q Q \ when 
performing a fit with N e g as a free parameter. The best strategy in such circumstances 
is to use a broad and uniform prior on Q n \, as adopted in our analysis and also advocated 
in Ref. [1]. 

To summarise, we find that imposing a Q n \ — 10 ± 5 prior for the WMAP+SDSS- 
DR2-Q+SNIa fit biases the preferred N cS to higher values. This may account for the 
difference between our result and those reported in Refs. [9, 13]. || 

6.4- Combining all data sets 

Having identified and corrected the problematic issues, we now turn to our own N e s 
estimates. An inspection of table 2 reveals that, except for those sets including Lja, 
none of the combinations of probes shows any significant evidence for iV eff ^ 3.046, a 
value that always sits comfortably within the 68% MCI. The combination of all linear 
data together with HST (All-lin+HST) gives N cS = 2.6 f*. Discarding HST leaves 
the best fit unchanged, but slightly loosens the credible intervals. 

Including nonlinear data in the galaxy power spectrum tends to reduce the numbers 
a little to iVeff = 2.0 (All-Q), essentially because 2dF-Q prefers a low N e g. 

Adding HST shifts it up again to N c g — 2.4 ^g' g. We repeat that the bias correction 
formula (4.2) may not be applicable in nonstandard cosmologies so that numbers from 
the Q sets must be interpreted with caution. 

Another interesting feature is that, with the exception of WMAP+2dF-Q, all 
combinations of data sets prefer a nonzero N e g at the 95% level or better. This is 
in contrast to the results of Ref. [12], which finds no lower 95% limit from the WMAP 
data alone. We have not investigated where the differences come from. As mentioned 
before, the WMAP+2dF-Q data set tends to prefer lower values of N c s and as such 
produces no lower 95% limit on N cS . 

The Lja data appear to be the only data set that prefers a much larger value of 
N cS , with WMAP+Lya disfavouring N cS = 3.046 at 95%. When combined with other 
data sets, however, the evidence against N eS = 3.046 is weakened to the 68% level, 
N e ff = 3.8 2 '.g] 2^2 f° r All-Q+Lya+HST, because 2dF-Q's preference for small N cS values 
tends to pull in the opposite direction. 

The origin of Lya's preference for large values of N e g can be gleaned from figure 3. 
The Lja data prefer a much higher amplitude of density fluctuations at small scales, 
quantified by a%, than other data sets. This is particularly evident in the bottom panels 
of figure 3. The higher og value required by Lja forces N e g upwards and cuts away 

|| For completeness, we quote here the constraints on Q n \ derived from WMAP+SDSS-DR2 using 
19 data bands (i.e., fc max ~ 0.2 h Mpc -1 ) in the vanilla model: Q n i = 15^4 (68% C.L.). Here, 
five additional data points at large k values allow one to place much tighter constraints on Q n \ than 
is possible with only 14 data bands used in, e.g., figure 2. This result should be compared with 
Qni = 30tti for WMAP+SDSS-LRG (20 bands) [1] and Q nl = 4.6 ± 1.5 for WMAP+2dF (36 bands) 
[26] for the same model. 



Observational bounds on the cosmic radiation density 



17 




Figure 3. The 2D marginal 68% and 95% allowed regions in the minimal model for 
the indicated pairs of parameters. Plots in the left column use the All-Q+HST data 
set, while those in the right column include also Lya (All-Q+Lya+HST). 

the allowed region for low N e g values. As can be seen in the same figure, with the 
inclusion of Lya, the upper bound on N e g comes mainly from the HST prior on H . 
Since N c s and H both control the epoch of matter-radiation equality and are thus 
strongly degenerate, a large N e s can only be accommodated by a high value of H . 
However, such high values are strongly disfavoured by the HST data. 

The overall shift in the allowed range for N e g between WMAP+Lya and All-Q+Lya 
also points to the fact that the SDSS-Lya data is not completely compatible with other 
data sets (see, e.g., Refs. [9,43]). 

6.5. Towards Gaussianity 

A striking feature in table 2 is that when all data sets are combined, the three different 
statistical methods give almost identical results. The reason is that the combination 
of CMB, LSS, and SNIa data effectively breaks all parameter degeneracies and yields a 
posterior distribution that is very close to Gaussian, a limit in which all three methods 
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Table 3. Point estimates and credible intervals (68% and 95%) for N c g in four 
extended model spaces. In the top segment, the minimal vanilla+-/V c ff model is 
extended with f v and f u +a s +w (N m = N e s), while in the middle segment the 
extensions are 3 /„ and 3 f v +a s +w (N m = 3.046) as defined in section 2. The 
bottom segment contains results for the minimal model copied from table 2. The 
priors for the free parameters are given in table 1. The columns headed "prior 2" 
use a top hat prior 0.4 < h < 1.0, while those with "+HST" use in addition the 
HST result. The data sets used are All-lin = WMAP+BAO+SNIa+LSS-lin and 
All-Q = WMAP+BAO+SNIa+LSS-Q. 



Model 



Data 



Bayesian CCI 

/ at v 68%T, 95%T 
\ Jv eff/ 68%;, 95%; 



Prior 2 



+HST 



Bayesian MCI 

(1) 68%T, 95%T 



N. 



eff 68%;, 95%; 



Prior 2 



+HST 



Maximisation 

iV eff la;, 2ai 



Prior 2 



+HST 



+f v +a s +w 
+f v +a s +w 



All-lin 
All-Q 
All-lin 
All-Q 



5.6, 8.2 

2.5, 1.5 

5.0, 7.0 

2.2, 1.1 

5.3, 8.1 
2.0, 1.0 

4.9, 7.1 



4.0 
3.6 
3.7 

' 1.8, 0.9 



3.7 
3.5 
3.7 



4.9 
2.6 

4.5 
2.4 

5.1 
2.3 



3.3 I 



4.6 



6.3 
1.8 

5.8 
1.7 

6.6 
1.4 

6.3 
1.0 



3.2 
2.9 
3.1 
2.3 



5.0, 7.8 

2.0, 1.1 

4.7, 6.6 

1.9, 0.8 

4.9, 7.6 

1.6, 0.4 



3.6 
3.2 
2.6 



4.7, 6.1 

2.4, 1.6 

4.3, 5.6 

2.2, 1.5 

4.7, 6.4 

2.0, 1.2 



3.0 
3.2 
2.5 



4.6, 6.2 
2.0, 1.1 

3.8 



4.2, 
1.3, 



6.8 
0.5 



o n 4.3. 6.1 
1.7 



0.8 



2.6 { 



5.5 
1.3 

5.5 
0.8 

5.1 
0.5 



3.7 
3.0 
3.0 



4.1, 5.7 
2.5, 1.8 

3.8, 



2.3, 

3.6, 
2.3, 



5.3 
1.6 

5.5 
1.1 



9 a 4 2 > 51 
z - y 1.7, 1.0 



+ 3 .u 

+ 3 .f„+a s +w 
+ 3 .U+a s +w 



All-lin 
All-Q 
All-lin 
All-Q 



4.9 
4.4 



5.3. 8.0 
3.0, 3.0 

4.6, 7.1 



3.0, 3.0 

r , 5.3, 9.4 

°' 1 3.0, 3.0 

4.7, 7.3 

3.0, 3.0 



4.4 



4.4 
4.2 
4.4 
4.1 



4.8 
3.0 

4.5 
3.0 

4.7 
3.0 

4.3 
3.0 



6.7 
3.0 

6.1 
3.0 

6.7 
3.0 

5.8 
3.0 



3.2 
3.0 



5.3. 8.0 

3.0, 3.0 

4.6. 7.1 

3.0, 3.0 



3.2 
3.2 



4.8. 6.7 

3.0, 3.0 

4.5, 6.1 

3.0, 3.0 



3.0 ! 
3.0 



3.0, 3.0 

4.7, 7.3 
3.0, 3.0 



3.2 



3.0, 3.0 

4.3, 5.8 
3.0, 3.0 



3.0 
3.0 
3.0 
3.0 



3.8 
3.0 

3.9 
3.0 

3.9 
3.0 

3.7 
3.0 



5.7 
3.0 

5.2 
3.0 

6.4 
3.0 

5.0 
3.0 



3.0 
3.0 



3.9. 5.7 

3.0, 3.0 

3.7. 5.0 

3.0, 3.0 



3.2 ' 
3.0 



3.0, 3.0 

3.8, 5.0 
3.0, 3.0 



Minimal 
Minimal 



All-lin 2.9 
All-Q 2.3 



4.0, 5.3 
1.8, 1.1 



3.2, 
1.4. 



4.4 
0.7 



2.8 
2.5 



3.7 
1.9 



3.5 
1.6 




5.0 
0.8 

4.0 
0.6 



2.7 
2.2 



3.2. 4.5 

2.0, 1.1 

2.7. 3.8 

1.6, 0.9 



must give the same result. The lower half of table 2 nicely confirms this expectation. 
7. Extended models 

We now consider constraints on N e ff in the context of extended models that allow also 
for nonvanishing neutrino masses. As in the case of the minimal model, we calculate 
the bounds within a conservative approach using only linear data (All-lin), as well as a 
more speculative one that utilises the stronger, but more model-dependent All-Q data 
set. Since, as we saw in section 6, N eS exhibits a strong degeneracy with the Hubble 
parameter H in some data sets, we consider both options of including and excluding 
the HST data in our analysis. We do not use the Lya data for the extended models. 
Table 3 shows our constraints on N e g for four choices of extended models: vanilla+iVeff 
extended with f U) f u +a s +w, 3 f u , and 3 f u +a s +w. 



Observational bounds on the cosmic radiation density 



19 



7.1. N m = N cS 

Consider first the top half of table 3. The two extended models have, respectively, 
vanilla+A^flf+Z^ and vanilla+7V e ff+/ l/ +ai s +'u; as free parameters. Also in place is the 
condition N m = iV eff , meaning that all N cS neutrinos have equal masses m„. In both 
cases, it is evident that some new degeneracies have arisen with the introduction of 
additional free parameters; the marginal posteriors for N e g are not perfect Gaussians 
for the All-lin and All-Q data sets, as indicated by the fact that their associated credible 
intervals from different constructions do not exactly overlap. However, none of the All- 
lin and All-Q results show any significant deviation from the standard N ef[ = 3.046, and 
adding the HST data essentially serves to tighten the bounds. 

It is interesting to note that, in the case of the smaller vanilla+iV eff +/' l/ model, 
adding the HST data brings the marginal posterior for iV eff much closer to the Gaussian 
limit, so that the three different credible interval construction methods give almost 
identical results. Our best estimate is N e ff = 3.2 £2 L5 (All-Q+HST), values that are 
somewhat larger than those found in the minimal vanilla+A^ff model for the same data 
set, N e ff = 2.4 'g, because of a degeneracy between N e g and f v . 

For the even larger vanilla+iVeff+/j,+a: s +K; model, an additional degeneracy 
between N e g and w comes into play so that the posterior for N e g becomes more non- 
Gaussian. For All-Q+HST, for example, even though the MCI and the CCI have more 
or less converged (thus indicating a symmetric marginal posterior), the limits from 
maximisation are still very different. As our formal bound we use the MCI estimate 
for All-Q+HST, N cS = 3.0 but also note that all three methods give credible 

intervals that are compatible with N e g = 3.046 at better than 68%. Thus, as was the 
case for the minimal model, there is no evidence for any nonstandard value of N e g. 

Figure 4 shows the 2D marginal contours in the J2 m v~N e K plane for the extended 
model vemilla+Neff+fv+as+w and the data set All-Q+HST. Some degeneracy persists 
between X m v and N e s, in contrast to earlier results from some of us [10]. The difference 
can be traced to a generally more conservative approach taken in the present work, 
particularly with regard to scale-dependent biasing, as well as a different statistical 
methodology (Bayesian marginalisation vs maximisation). 

7.2. N m = 3 

The bottom half of table 3 shows constraints on N eS for essentially the same two classes 
of models, vanilla+iV cff + 3 /' i , and vanilla+iV cff + 3 / 1 ,+a s +-u;, except we now impose the 
condition N m = 3, representing models with three massive neutrinos and N c s — N m 
massless species. This model is different from that presented above in section 7.1 because 
there is now a hard lower limit of N c g = 3. 

The presence of a hard limit can in principle lead to some very disparate credible 
intervals from the three different construction methods. In the present case, however, 
the ID marginal and profile posteriors for N e s both peak at or very near the limit. It is 
therefore more useful to report, instead of a CCI, an upper 1007% limit constructed by 
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Figure 4. The 2D marginal 68% and 95% allowed regions in m v and N c g in 
the extended model vanilla+-/V c ff+/', y +a! s +u>, using the data set All-Q+HST. The 
corresponding contours for the model vanilla+A , c ff+ 3 /y+as+w are similar, but with 
a cut-off at Nes = 3. 

requiring that a fraction 7 of the marginal posterior's volume lies to the left of the limit. 
This construction is also a default setting of GetDist for parameter estimation in the 
presence of hard limits. For simplicity, however, we shall continue to label an interval 
thus constructed as a CCI. The definitions of an MCI and a maximisation interval are 
the same as before. 

The fact that the marginal posterior for N e g peaks at or very near the hard limit 
also means that, although the posterior mean and mode still differ, the CCI and the MCI 
will coincide, as is clearly shown in the bottom half of table 3. All estimates indicate 
that N e g = 3.046 sits safely within the 68% region. Our best estimate for the smaller 
v&m\\a+N e{i + 3 f u model, based on the MCI, is N eS = 3.2 £g; g;J (All-Q+HST), while for 
the larger vani\\a+N eS + 3 f u +a s +w model we find N cS = 3.2 3^' |o using the same data 
set. 

8. Conclusions 

Motivated by several recent, seemingly conflicting inferences of the cosmic radiation 
density (traditionally parameterised as the effective number of neutrino species N eS ) 
from cosmological observations, we have re-examined the issue of cosmological N e s 
determination in great detail and identified the reasons for the apparent discrepancies. 
Using a minimal model with N e s as the only nonstandard parameter (i.e., 
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vanilla+iV cff ), we find that the treatment of scale-dependent biasing in the galaxy power 
spectrum data is crucial to the derived value of N c g. The very high values of N e Q 
found in Refs. [9, 13] for the WMAP+SDSS-DR2+SNIa data and the same model can 
be traced to their treatment of the Q n \ parameter which quantifies the level of bias 
correction. The prior on Q n \ imposed in these studies, Q n \ = 10 ± 5, is significantly 
more restrictive than the parameter space allowed by the WMAP+SDSS-DR2-Q data. 
Because of a degeneracy between N eS and Q n \, such a restrictive prior cuts out much of 
the parameter region that favours low values of N c s and consequently biases the inferred 
N eS towards high values (figure 2). The use of restrictive priors on Q ni is unjustified 
when fitting nonstandard cosmologies, unless the priors have been verified/ supplemented 
by simulations or other means under the same model assumptions. In the absence of 
such information, the best strategy is to use broad and uniform priors. 

When the WMAP measurements are combined with any other single data set 
(LSS, BAO, SNIa, or HST), we find that the inferred N cS is always compatible with 
the standard value N cS = 3.046 at 68% C.L. or better, except for the combination 
WMAP+Lya, which yields a high N cS value in disagreement with 3.046 at more than 
95%. The reason Lya prefers a high N c s originates in a well-known discrepancy in the 
inferred small-scale fluctuation amplitude between the SDSS-Lya and the WMAP data. 
This can be understood from our figure 3. 

When all data sets (except Lya) are used in combination, we find tighter bounds 
on N e f[ that are, again, compatible with N c g = 3.046 at better than the 68% level. 
This finding is independent of whether we use galaxy power spectrum data only in the 
strictly linear regime or also at higher values of k, as long as scale-dependent bias is 
correctly taken into account. When Lya is added to the fit, the inferred iV eff is again 
shifted to higher values because of Lya's normalisation discrepancy with WMAP. As 
discussed in section 3.6 this discrepancy is most likely due to unaccounted systematics 
in the Lya data. For this reason we quote a result without Lya, N c s = 2.6 ^ i®, as 
our best current estimate of the constraints on N c g in the minimal vanilla+iVefj model 
from WMAP+LSS-Q+BAO+SNIa+HST. 

Another very interesting point is that the statistical method used to construct 
credible intervals can have a strong impact on parameter inference when the posterior 
probability is non-Gaussian. Using an inappropriate interval construction can sometimes 
lead to incorrect inferences. This is especially true when fitting data sets that are not 
very constraining and therefore contain strong parameter degeneracies. However, when 
all available data sets are used in combination, they conspire to break each other's 
degeneracies. The ID posterior for N e g in the minimal model approaches the Gaussian 
limit, and all three interval constructions used in our analysis, the Bayesian central and 
minimum credible intervals, and the non-Bayesian concept of maximisation, give almost 
identical results in this case. 

New parameter degeneracies arise when more free parameters are introduced in 
extended models. Even when the parameter inference is performed with all data sets 
combined, there is still some, albeit small, differences in the credible intervals obtained 
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from the different methods. We have considered several different extended models in 
the present work, all including nonzero neutrino masses as a free parameter. While 
the formal constraints on jV e ff differ slightly from model to model, we find again that 
Nes = 3.046 is always compatible with data at the 68% C.L. or better, as long as 
we exclude the Lya data. Because of the additional parameters the formal bounds on 
N e g are somewhat relaxed relative to those derived for the minimal model. For our 
most general model (i.e., vaniila+7V e ff+/ i ,+a! s +'u;, with iV eff equally massive neutrinos), 
we find JV e ff = 3.0 based on the minimum credible interval, using the data set 

WMAP+LSS-Q+BAO+SNIa+HST. 

We consider also the case in which the total radiation density is split into three 
massive species and N eS — 3 strictly massless ones. In this case we find almost identical 
upper bounds on N e g as in the previous case with N c g massive species (the lower 
bounds here are now always 3.0). Extra radiation density corresponding to at least 
one extra neutrino degree of freedom is allowed by all data sets at the 95% level. Thus, 
cosmological observations are not yet at a precision level sufficient to exclude very light 
sterile neutrinos, axions, majorons, or similar particles that were in thermal equilibrium 
after the QCD phase transition. With future CMB and weak gravitational lensing data 
this situation is set to change. For instance, with data from Planck and the future wide- 
field weak lensing survey LSST, a sensitivity of o~(N e g) ~ 0.07 can be achieved [44]. 
Cosmology will then become an even more powerful probe of particle physics beyond 
the standard model. 
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